HDFS Command

hadoop fs -ls- Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry.
hdfs destination path
hadoop fs -ls -R- Behaves like -ls, but recursively displays entries in all sub-directories of path.
hadoop fs -lsr
hadoop fs -lsr /

Hadoop directoryname
hadoop fs -mkdir:- Creates a directory in HDFS.
hadoop fs -mkdir  new_folde
hadoop fs -mkdir -p /test/parent
-p : Not to fail if directory already exists.
hadoop fs -mkdir hdfs://quickstart.cloudera:8020/user/test2


hadoop fs -cat:- To see the content of the folder on stdout.
HDFS file test

hadoop fs -put:- Copy the file or directory from the local file system HDFS.
  • hadoop fs -put /home/cloudera/Desktop/data/emp1.txt  data/sqoop_data3
  • hadoop fs -copyFromLocal /home/cloudera/Desktop/data/emp1.txt data/

local file system

Note:-If you can overwrite the existing file in HDFS use -f flag 
  • hadoop fs -put -f /home/cloudera/Desktop/data/emp1.txt  data/sqoop_data3
hadoop fs -get:- Copy the file or directory in HDFS to the local file system path.
hadoop fs -get data/emp1.txt   /home/cloudera/Desktop/data/export
hadoop fs -moveFromLocal:- Copy the file or directory from the local file system to HDFS, then deletes the local copy on success.
hadoop fs -moveFromLocal   /home/hduser/getData/abc.txt   data/
hadoop fs -copyFromLocal:- Copy the file or directory from the local file system to HDFS,  then deletes the local copy on success.
Usage: hadoop fs -copyFromLocal <localsrc> URI
  • -p : Preserves access and modification times, ownership and the permissions. (assuming the permissions can be propagated across filesystems)
  • -f : Overwrites the destination if it already exists.
  • -l : Allow DataNode to lazily persist the file to disk, Forces a replication factor of 1. This flag will result in reduced durability. Use with care.
  • -d : Skip creation of temporary file with the suffix ._COPYING_.
hadoop fs -copyToLocal:-  Copy the file or directory to the local file system to HDFS,  Usage: hadoop fs -copyToLocal [-ignorecrc] [-crc] URI <localdst>

Similar to get command, except that the destination is restricted to a local file reference.-

hadoop fs -cp: To copy  the file or directory within HDFS.
hadoop fs -cp data/abc.txt data1/
hadoop fs  -mv:- Moves the file or directory within HDFS.
hadoop fs -mv data/abc.txt data1/

rm :- This command is similar to the UNIX rm command, and it is used for removing a file from the HDFS file system. The command –rmr can be used to delete files recursively.

hadoop fs -rm:- Removes the file or empty directory.
Hadoop fs -rm abc.txt


hadoop fs -rm -r:- Removes the file or directory identified by path. Recursively deletes any child entries (i.e., files or sub-directories of path).

hadoop fs -rm -r /data

hadoop fs -rm -r -skipTrash:-  used to bypass the trash then it immediately deletes the source
hadoop fs –f:-  mention that if there is no file existing
hadoop fs –rR:- used to recursively delete directories

hadoop fs -touchz:-Create a file.
hadoop fs -touchz /bbb

hadoop fs -du:-Displays sizes of files and directories contained in the given directory or the length of a file in case its just a file.
hadoop fs -du /data

hadoop fs -du -s:-Displays a summary of file lengths.
hadoop fs -du -s / 

hadoop fs -getmerge:-Retrieves all files that match the path src in HDFS, and copies them to a single, merged file in the local file system.

hadoop fs -getmerge /a* merge

Hadoop fs -getmerge -nl <source file path> <local system destination path> 
The getmerge command has three parameters:
  • <src files> is the HDFS path to the directory that contains the files to be concatenated
  • <dist file> is the local filename of the merged file
  • [-nl] is an optional parameter that adds a new line in the result file.
hadoop fs -getmerge -nl f1.txt f2.txt merge_file

hadoop fs -appendToFile:-Take multiple files from local file system and append to the HDFS file
Usage: hadoop fs -appendToFile <localsrc> ... <dst>
Ex: hadoop fs -appendToFile a* /appfile
hadoop fs -appendToFile /home/tsipl1038/data/d1 /user/tsipl1038/

hadoop fs -expunge:-Empty the Trash 
hadoop fs –expunge

hadoop fs -help:-Returns usage information for any of the commands listed above
hadoop fs -help get
hadoop fs -usage get

hadoop fs -text:- To view the zip file data.
hadoop fs -text /f1.log.gz

This command is used to test an HDFS file’s existence of zero length of the file or whether if it is a directory or not.
hadoop fs -test:- hadoop fs -test -e /a2.log
–d -- used to check whether if it is a directory or not, returns 0 if it is a directory.
-e -- used to check whether they exist or not, returns 0 if the exists.
-f -- used to check whether there is a file or not, returns 0 if the file exists.
-s -- used to check whether the file size is greater than 0 bytes or not, returns 0 if the size is greater than 0 bytes.
-z -- used to check whether the file size is greater than 0 bytes or not, returns 0 if the size is greater than 0 bytes.
hadoop fs -count:- Count the number of directories, files and bytes under the path.
hadoop fs -count /a2.log

hadoop fs -setrep:-Changes the replication factor of a file. -R option is for recursively increasing the replication factor of files within a directory.
hadoop fs -setrep -w 2 /a2.log

hadoop fs -fsck:-It checks the health of the Hadoop file system
 hdfs fsck /

Find the number of blocks a hdfs divided into
hadoop fsck /path/to/file -files -blocks

hadoop fs -sata:- hadoop fs -stats

Print statistics about the file/directory at in the specified format. Format accepts file size in blocks (%b), type (%F), the group name of owner (%g), name (%n), block size (%o), replication (%r), the user name of owner(%u), and modification date (%y, %Y). %y shows UTC date as “yyyy-MM-dd HH:mm:ss” and %Y shows milliseconds since January 1, 1970, UTC. If the format is not specified, %y is used by default.

hadoop fs -stat “%F %u:%g %b %y %n” /file

DistCp:- DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list. Its MapReduce pedigree has endowed it with some quirks in both its semantics and execution. The purpose of this document is to offer guidance for common tasks and to elucidate its model.

No comments:

Post a Comment